Overview

Dataset statistics

Number of variables18
Number of observations1017209
Missing cells2173431
Missing cells (%)11.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory139.7 MiB
Average record size in memory144.0 B

Variable types

Numeric9
DateTime1
Categorical8

Alerts

StateHoliday is highly imbalanced (88.2%)Imbalance
CompetitionOpenSinceMonth has 323348 (31.8%) missing valuesMissing
CompetitionOpenSinceYear has 323348 (31.8%) missing valuesMissing
Promo2SinceWeek has 508031 (49.9%) missing valuesMissing
Promo2SinceYear has 508031 (49.9%) missing valuesMissing
PromoInterval has 508031 (49.9%) missing valuesMissing
Sales has 172871 (17.0%) zerosZeros
Customers has 172869 (17.0%) zerosZeros

Reproduction

Analysis started2024-01-12 17:55:33.883842
Analysis finished2024-01-12 17:55:58.466092
Duration24.58 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

Store
Real number (ℝ)

Distinct1115
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean558.42973
Minimum1
Maximum1115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:55:58.548289image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile56
Q1280
median558
Q3838
95-th percentile1060
Maximum1115
Range1114
Interquartile range (IQR)558

Descriptive statistics

Standard deviation321.90865
Coefficient of variation (CV)0.57645329
Kurtosis-1.2005237
Mean558.42973
Median Absolute Deviation (MAD)279
Skewness-0.00095487998
Sum5.6803974 × 108
Variance103625.18
MonotonicityNot monotonic
2024-01-12T14:55:58.669590image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 942
 
0.1%
726 942
 
0.1%
708 942
 
0.1%
709 942
 
0.1%
713 942
 
0.1%
714 942
 
0.1%
715 942
 
0.1%
717 942
 
0.1%
718 942
 
0.1%
720 942
 
0.1%
Other values (1105) 1007789
99.1%
ValueCountFrequency (%)
1 942
0.1%
2 942
0.1%
3 942
0.1%
4 942
0.1%
5 942
0.1%
6 942
0.1%
7 942
0.1%
8 942
0.1%
9 942
0.1%
10 942
0.1%
ValueCountFrequency (%)
1115 942
0.1%
1114 942
0.1%
1113 942
0.1%
1112 942
0.1%
1111 942
0.1%
1110 942
0.1%
1109 758
0.1%
1108 942
0.1%
1107 758
0.1%
1106 942
0.1%

DayOfWeek
Real number (ℝ)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.9983406
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:55:58.762113image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.997391
Coefficient of variation (CV)0.49955499
Kurtosis-1.2468733
Mean3.9983406
Median Absolute Deviation (MAD)2
Skewness0.0015928228
Sum4067148
Variance3.9895707
MonotonicityNot monotonic
2024-01-12T14:55:58.849144image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5 145845
14.3%
4 145845
14.3%
3 145665
14.3%
2 145664
14.3%
1 144730
14.2%
7 144730
14.2%
6 144730
14.2%
ValueCountFrequency (%)
1 144730
14.2%
2 145664
14.3%
3 145665
14.3%
4 145845
14.3%
5 145845
14.3%
6 144730
14.2%
7 144730
14.2%
ValueCountFrequency (%)
7 144730
14.2%
6 144730
14.2%
5 145845
14.3%
4 145845
14.3%
3 145665
14.3%
2 145664
14.3%
1 144730
14.2%

Date
Date

Distinct942
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
Minimum2013-01-01 00:00:00
Maximum2015-07-31 00:00:00
2024-01-12T14:55:58.962662image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:59.094182image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Sales
Real number (ℝ)

ZEROS 

Distinct21734
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5773.819
Minimum0
Maximum41551
Zeros172871
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:55:59.226246image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13727
median5744
Q37856
95-th percentile12137
Maximum41551
Range41551
Interquartile range (IQR)4129

Descriptive statistics

Standard deviation3849.9262
Coefficient of variation (CV)0.66679025
Kurtosis1.7783747
Mean5773.819
Median Absolute Deviation (MAD)2067
Skewness0.64145962
Sum5.8731806 × 109
Variance14821932
MonotonicityNot monotonic
2024-01-12T14:55:59.342270image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 172871
 
17.0%
5674 215
 
< 0.1%
5558 197
 
< 0.1%
5483 196
 
< 0.1%
6214 195
 
< 0.1%
6049 195
 
< 0.1%
5723 194
 
< 0.1%
5449 192
 
< 0.1%
5140 191
 
< 0.1%
5489 191
 
< 0.1%
Other values (21724) 842572
82.8%
ValueCountFrequency (%)
0 172871
17.0%
46 1
 
< 0.1%
124 1
 
< 0.1%
133 1
 
< 0.1%
286 1
 
< 0.1%
297 1
 
< 0.1%
316 1
 
< 0.1%
416 1
 
< 0.1%
506 1
 
< 0.1%
520 1
 
< 0.1%
ValueCountFrequency (%)
41551 1
< 0.1%
38722 1
< 0.1%
38484 1
< 0.1%
38367 1
< 0.1%
38037 1
< 0.1%
38025 1
< 0.1%
37646 1
< 0.1%
37403 1
< 0.1%
37376 1
< 0.1%
37122 1
< 0.1%

Customers
Real number (ℝ)

ZEROS 

Distinct4086
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean633.14595
Minimum0
Maximum7388
Zeros172869
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:55:59.449790image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1405
median609
Q3837
95-th percentile1362
Maximum7388
Range7388
Interquartile range (IQR)432

Descriptive statistics

Standard deviation464.41173
Coefficient of variation (CV)0.73349871
Kurtosis7.0917727
Mean633.14595
Median Absolute Deviation (MAD)216
Skewness1.5986503
Sum6.4404176 × 108
Variance215678.26
MonotonicityNot monotonic
2024-01-12T14:55:59.583307image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 172869
 
17.0%
560 2414
 
0.2%
576 2363
 
0.2%
603 2337
 
0.2%
571 2330
 
0.2%
555 2328
 
0.2%
566 2327
 
0.2%
517 2326
 
0.2%
539 2309
 
0.2%
651 2299
 
0.2%
Other values (4076) 823307
80.9%
ValueCountFrequency (%)
0 172869
17.0%
3 1
 
< 0.1%
5 1
 
< 0.1%
8 1
 
< 0.1%
13 1
 
< 0.1%
18 1
 
< 0.1%
36 1
 
< 0.1%
40 1
 
< 0.1%
44 1
 
< 0.1%
50 1
 
< 0.1%
ValueCountFrequency (%)
7388 1
< 0.1%
5494 1
< 0.1%
5458 1
< 0.1%
5387 1
< 0.1%
5297 1
< 0.1%
5192 1
< 0.1%
5152 1
< 0.1%
5145 1
< 0.1%
5132 1
< 0.1%
5112 1
< 0.1%

Open
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
1
844392 
0
172817 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 844392
83.0%
0 172817
 
17.0%

Length

2024-01-12T14:55:59.691843image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:55:59.773375image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
1 844392
83.0%
0 172817
 
17.0%

Most occurring characters

ValueCountFrequency (%)
1 844392
83.0%
0 172817
 
17.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 844392
83.0%
0 172817
 
17.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 844392
83.0%
0 172817
 
17.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 844392
83.0%
0 172817
 
17.0%

Promo
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
0
629129 
1
388080 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 629129
61.8%
1 388080
38.2%

Length

2024-01-12T14:55:59.858395image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:55:59.934932image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
0 629129
61.8%
1 388080
38.2%

Most occurring characters

ValueCountFrequency (%)
0 629129
61.8%
1 388080
38.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 629129
61.8%
1 388080
38.2%

Most occurring scripts

ValueCountFrequency (%)
Common 1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 629129
61.8%
1 388080
38.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 629129
61.8%
1 388080
38.2%

StateHoliday
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
0
986159 
a
 
20260
b
 
6690
c
 
4100

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 986159
96.9%
a 20260
 
2.0%
b 6690
 
0.7%
c 4100
 
0.4%

Length

2024-01-12T14:56:00.019451image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:56:00.100448image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
0 986159
96.9%
a 20260
 
2.0%
b 6690
 
0.7%
c 4100
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0 986159
96.9%
a 20260
 
2.0%
b 6690
 
0.7%
c 4100
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 986159
96.9%
Lowercase Letter 31050
 
3.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 20260
65.2%
b 6690
 
21.5%
c 4100
 
13.2%
Decimal Number
ValueCountFrequency (%)
0 986159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 986159
96.9%
Latin 31050
 
3.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 20260
65.2%
b 6690
 
21.5%
c 4100
 
13.2%
Common
ValueCountFrequency (%)
0 986159
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 986159
96.9%
a 20260
 
2.0%
b 6690
 
0.7%
c 4100
 
0.4%

SchoolHoliday
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
0
835488 
1
181721 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 835488
82.1%
1 181721
 
17.9%

Length

2024-01-12T14:56:00.190962image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:56:00.267484image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
0 835488
82.1%
1 181721
 
17.9%

Most occurring characters

ValueCountFrequency (%)
0 835488
82.1%
1 181721
 
17.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 835488
82.1%
1 181721
 
17.9%

Most occurring scripts

ValueCountFrequency (%)
Common 1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 835488
82.1%
1 181721
 
17.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 835488
82.1%
1 181721
 
17.9%

StoreType
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
a
551627 
d
312912 
c
136840 
b
 
15830

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc
2nd rowa
3rd rowa
4th rowc
5th rowa

Common Values

ValueCountFrequency (%)
a 551627
54.2%
d 312912
30.8%
c 136840
 
13.5%
b 15830
 
1.6%

Length

2024-01-12T14:56:00.351531image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:56:00.433045image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
a 551627
54.2%
d 312912
30.8%
c 136840
 
13.5%
b 15830
 
1.6%

Most occurring characters

ValueCountFrequency (%)
a 551627
54.2%
d 312912
30.8%
c 136840
 
13.5%
b 15830
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1017209
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 551627
54.2%
d 312912
30.8%
c 136840
 
13.5%
b 15830
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 1017209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 551627
54.2%
d 312912
30.8%
c 136840
 
13.5%
b 15830
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 551627
54.2%
d 312912
30.8%
c 136840
 
13.5%
b 15830
 
1.6%

Assortment
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
a
537445 
c
471470 
b
 
8294

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowc
5th rowa

Common Values

ValueCountFrequency (%)
a 537445
52.8%
c 471470
46.3%
b 8294
 
0.8%

Length

2024-01-12T14:56:00.526581image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:56:00.604592image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
a 537445
52.8%
c 471470
46.3%
b 8294
 
0.8%

Most occurring characters

ValueCountFrequency (%)
a 537445
52.8%
c 471470
46.3%
b 8294
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1017209
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 537445
52.8%
c 471470
46.3%
b 8294
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 1017209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 537445
52.8%
c 471470
46.3%
b 8294
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 537445
52.8%
c 471470
46.3%
b 8294
 
0.8%

CompetitionDistance
Real number (ℝ)

Distinct654
Distinct (%)0.1%
Missing2642
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean5430.0857
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:56:00.702122image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile130
Q1710
median2330
Q36890
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6180

Descriptive statistics

Standard deviation7715.3237
Coefficient of variation (CV)1.4208475
Kurtosis13.000022
Mean5430.0857
Median Absolute Deviation (MAD)1980
Skewness2.928534
Sum5.5091857 × 109
Variance59526220
MonotonicityNot monotonic
2024-01-12T14:56:00.819692image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
250 11120
 
1.1%
50 7536
 
0.7%
350 7536
 
0.7%
1200 7374
 
0.7%
190 7352
 
0.7%
180 6594
 
0.6%
90 6594
 
0.6%
330 6410
 
0.6%
150 6226
 
0.6%
2640 5652
 
0.6%
Other values (644) 942173
92.6%
ValueCountFrequency (%)
20 942
 
0.1%
30 3767
0.4%
40 4710
0.5%
50 7536
0.7%
60 2826
 
0.3%
70 4526
0.4%
80 2826
 
0.3%
90 6594
0.6%
100 4710
0.5%
110 5468
0.5%
ValueCountFrequency (%)
75860 942
0.1%
58260 942
0.1%
48330 942
0.1%
46590 942
0.1%
45740 942
0.1%
44320 942
0.1%
40860 942
0.1%
40540 942
0.1%
38710 942
0.1%
38630 942
0.1%

CompetitionOpenSinceMonth
Real number (ℝ)

MISSING 

Distinct12
Distinct (%)< 0.1%
Missing323348
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean7.222866
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:56:00.920229image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.2118321
Coefficient of variation (CV)0.44467558
Kurtosis-1.248357
Mean7.222866
Median Absolute Deviation (MAD)3
Skewness-0.16986163
Sum5011665
Variance10.315866
MonotonicityNot monotonic
2024-01-12T14:56:01.008219image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9 114254
 
11.2%
4 87076
 
8.6%
11 84455
 
8.3%
3 63548
 
6.2%
7 59434
 
5.8%
12 57896
 
5.7%
10 55622
 
5.5%
6 45444
 
4.5%
5 39608
 
3.9%
2 37886
 
3.7%
Other values (2) 48638
 
4.8%
(Missing) 323348
31.8%
ValueCountFrequency (%)
1 12452
 
1.2%
2 37886
 
3.7%
3 63548
6.2%
4 87076
8.6%
5 39608
 
3.9%
6 45444
 
4.5%
7 59434
5.8%
8 36186
 
3.6%
9 114254
11.2%
10 55622
5.5%
ValueCountFrequency (%)
12 57896
5.7%
11 84455
8.3%
10 55622
5.5%
9 114254
11.2%
8 36186
 
3.6%
7 59434
5.8%
6 45444
 
4.5%
5 39608
 
3.9%
4 87076
8.6%
3 63548
6.2%

CompetitionOpenSinceYear
Real number (ℝ)

MISSING 

Distinct23
Distinct (%)< 0.1%
Missing323348
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean2008.6902
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:56:01.098749image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2015
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.9926444
Coefficient of variation (CV)0.0029833592
Kurtosis121.93467
Mean2008.6902
Median Absolute Deviation (MAD)3
Skewness-7.5395149
Sum1.3937518 × 109
Variance35.911787
MonotonicityNot monotonic
2024-01-12T14:56:01.200274image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
2013 75426
 
7.4%
2012 74299
 
7.3%
2014 63732
 
6.3%
2005 56564
 
5.6%
2010 51258
 
5.0%
2011 49396
 
4.9%
2009 49396
 
4.9%
2008 48476
 
4.8%
2007 43744
 
4.3%
2006 42802
 
4.2%
Other values (13) 138768
13.6%
(Missing) 323348
31.8%
ValueCountFrequency (%)
1900 758
 
0.1%
1961 942
 
0.1%
1990 4710
 
0.5%
1994 1884
 
0.2%
1995 1700
 
0.2%
1998 942
 
0.1%
1999 7352
 
0.7%
2000 9236
 
0.9%
2001 14704
1.4%
2002 24882
2.4%
ValueCountFrequency (%)
2015 35060
3.4%
2014 63732
6.3%
2013 75426
7.4%
2012 74299
7.3%
2011 49396
4.9%
2010 51258
5.0%
2009 49396
4.9%
2008 48476
4.8%
2007 43744
4.3%
2006 42802
4.2%

Promo2
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
1
509178 
0
508031 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1 509178
50.1%
0 508031
49.9%

Length

2024-01-12T14:56:01.299797image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:56:01.379838image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
1 509178
50.1%
0 508031
49.9%

Most occurring characters

ValueCountFrequency (%)
1 509178
50.1%
0 508031
49.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 509178
50.1%
0 508031
49.9%

Most occurring scripts

ValueCountFrequency (%)
Common 1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 509178
50.1%
0 508031
49.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 509178
50.1%
0 508031
49.9%

Promo2SinceWeek
Real number (ℝ)

MISSING 

Distinct24
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Infinite0
Infinite (%)0.0%
Mean23.269093
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:56:01.462357image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.095973
Coefficient of variation (CV)0.60578093
Kurtosis-1.3699286
Mean23.269093
Median Absolute Deviation (MAD)13
Skewness0.10452752
Sum11848110
Variance198.69644
MonotonicityNot monotonic
2024-01-12T14:56:01.569869image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
14 72990
 
7.2%
40 62598
 
6.2%
31 39976
 
3.9%
10 38828
 
3.8%
5 35818
 
3.5%
37 32786
 
3.2%
1 32418
 
3.2%
13 29820
 
2.9%
45 29268
 
2.9%
22 28694
 
2.8%
Other values (14) 105982
 
10.4%
(Missing) 508031
49.9%
ValueCountFrequency (%)
1 32418
3.2%
5 35818
3.5%
6 942
 
0.1%
9 12452
 
1.2%
10 38828
3.8%
13 29820
2.9%
14 72990
7.2%
18 27318
 
2.7%
22 28694
 
2.8%
23 4342
 
0.4%
ValueCountFrequency (%)
50 942
 
0.1%
49 758
 
0.1%
48 8294
 
0.8%
45 29268
2.9%
44 2642
 
0.3%
40 62598
6.2%
39 4732
 
0.5%
37 32786
3.2%
36 9236
 
0.9%
35 22814
 
2.2%

Promo2SinceYear
Real number (ℝ)

MISSING 

Distinct7
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Infinite0
Infinite (%)0.0%
Mean2011.7528
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2024-01-12T14:56:01.654399image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.6628704
Coefficient of variation (CV)0.00082657792
Kurtosis-1.0406623
Mean2011.7528
Median Absolute Deviation (MAD)1
Skewness-0.12005992
Sum1.0243403 × 109
Variance2.7651381
MonotonicityNot monotonic
2024-01-12T14:56:01.739914image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2011 115056
 
11.3%
2013 110464
 
10.9%
2014 79922
 
7.9%
2012 73174
 
7.2%
2009 65270
 
6.4%
2010 56240
 
5.5%
2015 9052
 
0.9%
(Missing) 508031
49.9%
ValueCountFrequency (%)
2009 65270
6.4%
2010 56240
5.5%
2011 115056
11.3%
2012 73174
7.2%
2013 110464
10.9%
2014 79922
7.9%
2015 9052
 
0.9%
ValueCountFrequency (%)
2015 9052
 
0.9%
2014 79922
7.9%
2013 110464
10.9%
2012 73174
7.2%
2011 115056
11.3%
2010 56240
5.5%
2009 65270
6.4%

PromoInterval
Categorical

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Memory size7.8 MiB
Jan,Apr,Jul,Oct
293122 
Feb,May,Aug,Nov
118596 
Mar,Jun,Sept,Dec
97460 

Length

Max length16
Median length15
Mean length15.191407
Min length15

Characters and Unicode

Total characters7735130
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJan,Apr,Jul,Oct
2nd rowJan,Apr,Jul,Oct
3rd rowJan,Apr,Jul,Oct
4th rowJan,Apr,Jul,Oct
5th rowFeb,May,Aug,Nov

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct 293122
28.8%
Feb,May,Aug,Nov 118596
 
11.7%
Mar,Jun,Sept,Dec 97460
 
9.6%
(Missing) 508031
49.9%

Length

2024-01-12T14:56:01.839940image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-12T14:56:01.922463image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct 293122
57.6%
feb,may,aug,nov 118596
23.3%
mar,jun,sept,dec 97460
 
19.1%

Most occurring characters

ValueCountFrequency (%)
, 1527534
19.7%
J 683704
 
8.8%
u 509178
 
6.6%
a 509178
 
6.6%
A 411718
 
5.3%
c 390582
 
5.0%
t 390582
 
5.0%
r 390582
 
5.0%
p 390582
 
5.0%
n 390582
 
5.0%
Other values (13) 2140908
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4170884
53.9%
Uppercase Letter 2036712
26.3%
Other Punctuation 1527534
 
19.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 509178
12.2%
a 509178
12.2%
c 390582
9.4%
t 390582
9.4%
r 390582
9.4%
p 390582
9.4%
n 390582
9.4%
e 313516
7.5%
l 293122
7.0%
b 118596
 
2.8%
Other values (4) 474384
11.4%
Uppercase Letter
ValueCountFrequency (%)
J 683704
33.6%
A 411718
20.2%
O 293122
14.4%
M 216056
 
10.6%
F 118596
 
5.8%
N 118596
 
5.8%
S 97460
 
4.8%
D 97460
 
4.8%
Other Punctuation
ValueCountFrequency (%)
, 1527534
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6207596
80.3%
Common 1527534
 
19.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
J 683704
 
11.0%
u 509178
 
8.2%
a 509178
 
8.2%
A 411718
 
6.6%
c 390582
 
6.3%
t 390582
 
6.3%
r 390582
 
6.3%
p 390582
 
6.3%
n 390582
 
6.3%
e 313516
 
5.1%
Other values (12) 1827392
29.4%
Common
ValueCountFrequency (%)
, 1527534
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7735130
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
, 1527534
19.7%
J 683704
 
8.8%
u 509178
 
6.6%
a 509178
 
6.6%
A 411718
 
5.3%
c 390582
 
5.0%
t 390582
 
5.0%
r 390582
 
5.0%
p 390582
 
5.0%
n 390582
 
5.0%
Other values (13) 2140908
27.7%

Interactions

2024-01-12T14:55:53.187025image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:41.219830image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:42.903544image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:44.631464image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:46.258086image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:47.796048image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:49.332379image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.772906image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.063239image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.317578image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:41.426387image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:43.100588image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:44.861015image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:46.432805image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:47.994178image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:49.623665image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.929452image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.194775image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.447107image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:41.625066image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:43.307665image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:45.066064image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:46.618386image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:48.189245image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:49.781172image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.082982image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.326333image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.574615image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:41.844618image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:43.507699image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:45.242292image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:46.794491image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:48.374797image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:49.948218image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.243563image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.451849image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.699139image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:42.051477image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:43.815798image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:45.444843image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:46.996643image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:48.564804image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.098738image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.401095image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.572380image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.834676image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:42.215090image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:43.983315image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:45.606368image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:47.169680image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:48.746764image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.243766image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.551131image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.686895image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.953192image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:42.391664image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:44.146358image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:45.761395image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:47.331269image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:48.908809image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.386804image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.692649image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.803421image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:54.085708image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:42.551716image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:44.281871image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:45.912529image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:47.472745image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:49.046328image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.504346image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.812201image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:52.928986image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:54.211741image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:42.681225image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:44.412430image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:46.045047image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:47.602665image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:49.175302image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:50.617387image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:51.929714image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-01-12T14:55:53.054497image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Missing values

2024-01-12T14:55:54.447288image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-12T14:55:55.491477image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-01-12T14:55:57.698050image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

StoreDayOfWeekDateSalesCustomersOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
0152015-07-3152635551101ca1270.09.02008.00NaNNaNNaN
1252015-07-3160646251101aa570.011.02007.0113.02010.0Jan,Apr,Jul,Oct
2352015-07-3183148211101aa14130.012.02006.0114.02011.0Jan,Apr,Jul,Oct
3452015-07-311399514981101cc620.09.02009.00NaNNaNNaN
4552015-07-3148225591101aa29910.04.02015.00NaNNaNNaN
5652015-07-3156515891101aa310.012.02013.00NaNNaNNaN
6752015-07-311534414141101ac24000.04.02013.00NaNNaNNaN
7852015-07-3184928331101aa7520.010.02014.00NaNNaNNaN
8952015-07-3185656871101ac2030.08.02000.00NaNNaNNaN
91052015-07-3171856811101aa3160.09.02009.00NaNNaNNaN
StoreDayOfWeekDateSalesCustomersOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
1017199110622013-01-010000a1ac5330.09.02011.0131.02013.0Jan,Apr,Jul,Oct
1017200110722013-01-010000a1aa1400.06.02012.0113.02010.0Jan,Apr,Jul,Oct
1017201110822013-01-010000a1aa540.04.02004.00NaNNaNNaN
1017202110922013-01-010000a1ca3490.04.02011.0122.02012.0Jan,Apr,Jul,Oct
1017203111022013-01-010000a1cc900.09.02010.00NaNNaNNaN
1017204111122013-01-010000a1aa1900.06.02014.0131.02013.0Jan,Apr,Jul,Oct
1017205111222013-01-010000a1cc1880.04.02006.00NaNNaNNaN
1017206111322013-01-010000a1ac9260.0NaNNaN0NaNNaNNaN
1017207111422013-01-010000a1ac870.0NaNNaN0NaNNaNNaN
1017208111522013-01-010000a1dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec